1,109 research outputs found
Early classification of time series as a non myopic sequential decision making problem
Classification of time series as early as possible is a valuable goal. Indeed, in many application domains, the earliest the decision, the more rewarding it can be. Yet, often, gathering more information allows one to get a better decision. The optimization of this time vs. accuracy tradeoff must generally be solved online and is a complex problem. This paper presents a formal criterion that expresses this trade-off in all generality together with a generic sequential meta algorithm to solve it. This meta algorithm is interesting in two ways. First, it pinpoints where choices can (have to) be made to obtain a computable algorithm. As a result a wealth of algorithmic solutions can be found. Second, it seeks online the earliest time in the future where a minimization of the criterion can be expected. It thus goes beyond the classical approaches that myopically decide at each time step whether to make a decision or to postpone the call one more time step. After this general setting has been expounded, we study one simple declination of the meta-algorithm, and we show the results obtained on synthetic and real time series data sets chosen for their ability to test the robustness and properties of the technique. The general approach is vindicated by the experimental results, which allows us to point to promising perspectives
Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion
Time Series Classification (TSC) has received much attention in the past two
decades and is still a crucial and challenging problem in data science and
knowledge engineering. Indeed, along with the increasing availability of time
series data, many TSC algorithms have been suggested by the research community
in the literature. Besides state-of-the-art methods based on similarity
measures, intervals, shapelets, dictionaries, deep learning methods or hybrid
ensemble methods, several tools for extracting unsupervised informative summary
statistics, aka features, from time series have been designed in the recent
years. Originally designed for descriptive analysis and visualization of time
series with informative and interpretable features, very few of these feature
engineering tools have been benchmarked for TSC problems and compared with
state-of-the-art TSC algorithms in terms of predictive performance. In this
article, we aim at filling this gap and propose a simple TSC process to
evaluate the potential predictive performance of the feature sets obtained with
existing feature engineering tools. Thus, we present an empirical study of 11
feature engineering tools branched with 9 supervised classifiers over 112 time
series data sets. The analysis of the results of more than 10000 learning
experiments indicate that feature-based methods perform as accurately as
current state-of-the-art TSC algorithms, and thus should rightfully be
considered further in the TSC literature
Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts
Training machine learning models from data with weak supervision and dataset
shifts is still challenging. Designing algorithms when these two situations
arise has not been explored much, and existing algorithms cannot always handle
the most complex distributional shifts. We think the biquality data setup is a
suitable framework for designing such algorithms. Biquality Learning assumes
that two datasets are available at training time: a trusted dataset sampled
from the distribution of interest and the untrusted dataset with dataset shifts
and weaknesses of supervision (aka distribution shifts). The trusted and
untrusted datasets available at training time make designing algorithms dealing
with any distribution shifts possible. We propose two methods, one inspired by
the label noise literature and another by the covariate shift literature for
biquality learning. We experiment with two novel methods to synthetically
introduce concept drift and class-conditional shifts in real-world datasets
across many of them. We opened some discussions and assessed that developing
biquality learning algorithms robust to distributional changes remains an
interesting problem for future research
ECOTS: Early Classification in Open Time Series
Learning to predict ahead of time events in open time series is challenging.
While Early Classification of Time Series (ECTS) tackles the problem of
balancing online the accuracy of the prediction with the cost of delaying the
decision when the individuals are time series of finite length with a unique
label for the whole time series. Surprisingly, this trade-off has never been
investigated for open time series with undetermined length and with different
classes for each subsequence of the same time series. In this paper, we propose
a principled method to adapt any technique for ECTS to the Early Classification
in Open Time Series (ECOTS). We show how the classifiers must be constructed
and what the decision triggering system becomes in this new scenario. We
address the challenge of decision making in the predictive maintenance field.
We illustrate our methodology by transforming two state-of-the-art ECTS
algorithms for the ECOTS scenario and report numerical experiments on a real
dataset for predictive maintenance that demonstrate the practicality of the
novel approach
Open challenges for Machine Learning based Early Decision-Making research
More and more applications require early decisions, i.e. taken as soon as
possible from partially observed data. However, the later a decision is made,
the more its accuracy tends to improve, since the description of the problem to
hand is enriched over time. Such a compromise between the earliness and the
accuracy of decisions has been particularly studied in the field of Early Time
Series Classification. This paper introduces a more general problem, called
Machine Learning based Early Decision Making (ML-EDM), which consists in
optimizing the decision times of models in a wide range of settings where data
is collected over time. After defining the ML-EDM problem, ten challenges are
identified and proposed to the scientific community to further research in this
area. These challenges open important application perspectives, discussed in
this paper
Measurement of differential cross sections for top quark pair production using the lepton plus jets final state in proton-proton collisions at 13 TeV
National Science Foundation (U.S.
Particle-flow reconstruction and global event description with the CMS detector
The CMS apparatus was identified, a few years before the start of the LHC operation at CERN, to feature properties well suited to particle-flow (PF) reconstruction: a highly-segmented tracker, a fine-grained electromagnetic calorimeter, a hermetic hadron calorimeter, a strong magnetic field, and an excellent muon spectrometer. A fully-fledged PF reconstruction algorithm tuned to the CMS detector was therefore developed and has been consistently used in physics analyses for the first time at a hadron collider. For each collision, the comprehensive list of final-state particles identified and reconstructed by the algorithm provides a global event description that leads to unprecedented CMS performance for jet and hadronic tau decay reconstruction, missing transverse momentum determination, and electron and muon identification. This approach also allows particles from pileup interactions to be identified and enables efficient pileup mitigation methods. The data collected by CMS at a centre-of-mass energy of 8 TeV show excellent agreement with the simulation and confirm the superior PF performance at least up to an average of 20 pileup interactions
- …